hadoop3 can not create timeline server hbase table - hadoop3

I read the Hadoop 3 about the timeline server v.2 document, it says
Finally, run the schema creator tool to create the necessary tables:
bin/hadoop
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineSchemaCreator
-create
The TimelineSchemaCreator tool supports a few options that may come handy especially when you are testing. For example, you can use -skipExistingTable (-s for short) to skip existing tables and continue to create other tables rather than failing the schema creation. By default, the tables will have a schema prefix of “prod.”. When no option or ‘-help’ (‘-h’ for short) is provided, the command usage is printed. and continue to create other tables rather than failing the schema creation. When no option or ‘-help’ (‘-h’ for short) is provided, the command usage is printed. By default, the tables will have a schema prefix of “prod.”
but I cannot find code about TimelineSchemaCreator code in package org.apache.hadoop.yarn.server.timelineservice.storage in any jar about Timeline server, why? is the document not update in time?
# find /opt/ -name 'hadoop-yarn-server-timelineservice*jar'
/opt/hadoop-3.1.1/share/hadoop/yarn/timelineservice/hadoop-yarn-server-timelineservice-3.1.1.jar
/opt/hadoop-3.1.1/share/hadoop/yarn/timelineservice/hadoop-yarn-server-timelineservice-hbase-common-3.1.1.jar
/opt/hadoop-3.1.1/share/hadoop/yarn/timelineservice/test/hadoop-yarn-server-timelineservice-hbase-tests-3.1.1.jar
/opt/hadoop-3.1.1/share/hadoop/yarn/timelineservice/hadoop-yarn-server-timelineservice-hbase-coprocessor-3.1.1.jar
/opt/hadoop-3.1.1/share/hadoop/yarn/timelineservice/hadoop-yarn-server-timelineservice-hbase-client-3.1.1.jar

i find in hadoop-yarn-server-timelineservice-hbase-client-3.1.1.jar

Related

Best practices when migrating data from one database scheme to another?

Often times when I am working on a project, I find my self looking at the database scheme and having to export the data to work with the new scheme.
Lots of times there has been a database where the data stored was fairly crude. What I mean by that is that its stored with lots of unfiltered characters. I find my self writing custom php scripts to filter through this information and create a nice clean UTF-8 CSV file that I then reimport into my new database.
I'd like to know if there are better ways to handle this?
You can consider Logstash.
logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching)
Logstash operate every single event/log like pipe: input | filter | output.
Logstash have many input plugins to accept different sources/formats, and you can use filter to parse your source data then output to multiple outputs/formats which you need.
No one answer to this one, but i once needed to quickly migrate a database and ended up using sqlautocode, which is a tool to autogenerate a (python orm) model from an existing database - the model uses the great sqlalchemy orm library. It even generates some sample code, to get started ... (see below)
Amazingly, it worked out of the box. You do not have a full migration, but an easy way to programmatically access all your tables (in python).
I didn't do it at that project, but you could of course autogenerate your orm layer for the target DB as well, then write a script, which transfers the right rows over into the desired structure.
Once you get your DB content into python, you will be able to deal with u'unicode', even if it will take some attepts, dependent on the actual crudeness ...
Example code:
# some example usage
if __name__ == '__main__':
db = create_engine(u'mysql://username:password#localhost/dbname')
metadata.bind = db
# fetch first 10 items from address_book
s = customers.select().limit(10)
rs = s.execute()
for row in rs:
print row
I would suggest using an ETL tool, or at least following ETL practices when moving data. Considering that you are already cleaning, you may follow the whole ECCD path -- extract, clean, conform, deliver. If you do your own cleaning, consider saving intermediate csv files for debug and audit purpose.
1. Extract (as is, junk included) to file_1
2. Clean file_1 --> file_2
3. Conform file_2 --> file_3
4. Deliver file_3 --> DB tables
If you archive files 1-3 and document versions of your scripts, you will be able to backtrack in case of a bug.
ETL tools -- like Microsoft SSIS, Oracle Data Integrator, Pentaho Data Integrator -- connect to various data sources and offer plenty of transformation and profiling tasks.

Table-level backup

How to take table-level backup (dump) in MS SQL Server 2005/2008?
You cannot use the BACKUP DATABASE command to backup a single table, unless of course the table in question is allocated to it's own FILEGROUP.
What you can do, as you have suggested is Export the table data to a CSV file. Now in order to get the definition of your table you can 'Script out' the CREATE TABLE script.
You can do this within SQL Server Management Studio, by:
right clicking Database > Tasks > Generate Script
You can then select the table you wish to script out and also choose to include any associated objects, such as constraints and indexes.
in order to get the DATA along with just the schema, you've got to choose Advanced on the set scripting options tab, and in the GENERAL section set the Types of data to script select Schema and Data
Hope this helps but feel free to contact me directly if you require further assitance.
I am using the bulk copy utility to achieve table-level backups
to export:
bcp.exe "select * from [MyDatabase].dbo.Customer " queryout "Customer.bcp" -N -S localhost -T -E
to import:
bcp.exe [MyDatabase].dbo.Customer in "Customer.bcp" -N -S localhost -T -E -b 10000
as you can see, you can export based on any query, so you can even do incremental backups with this. Plus, it is scriptable as opposed to the other methods mentioned here that use SSMS.
Here are the steps you need. Step5 is important if you want the data. Step 2 is where you can select individual tables.
EDIT stack's version isn't quite readable... here's a full-size image http://i.imgur.com/y6ZCL.jpg
You can run the below query to take a backup of the existing table which would create a new table with existing structure of the old table along with the data.
select * into newtablename from oldtablename
To copy just the table structure, use the below query.
select * into newtablename from oldtablename where 1 = 2
This is similar to qntmfred's solution, but using a direct table dump. This option is slightly faster (see BCP docs):
to export:
bcp "[MyDatabase].dbo.Customer " out "Customer.bcp" -N -S localhost -T -E
to import:
bcp [MyDatabase].dbo.Customer in "Customer.bcp" -N -S localhost -T -E -b 10000
If you're looking for something like MySQL's DUMP, then good news: SQL Server 2008 Management Studio added that ability.
In SSMS, just right-click on the DB in question and select Tasks > Generate Scripts. Then in the 2nd page of the options wizard, make sure to select that you'd like the data scripted as well, and it will generate what amounts to a DUMP file for you.
Create new filegroup, put this table on it, and backup this filegroup only.
You can use the free Database Publishing Wizard from Microsoft to generate text files with SQL scripts (CREATE TABLE and INSERT INTO).
You can create such a file for a single table, and you can "restore" the complete table including the data by simply running the SQL script.
I don't know, whether it will match the problem described here. I had to take a table's incremental backup! (Only new inserted data should be copied). I used to design a DTS package where.
I fetch new records (on the basis of a 'status' column) and transferred the data to destination. (Through 'Transform Data Task')
Then I just updated the 'status' column. (Through 'Execute SQL Task')
I had to fix the 'workflow' properly.
Use SQL Server Import and Export Wizard.
ssms
Open the Database Engine
Alt. click the database containing table to Export
Select "Tasks"
Select "Export Data..."
Follow the Wizard
Every recovery model lets you back up
a whole or partial SQL Server database
or individual files or filegroups of
the database. Table-level backups
cannot be created.
From: Backup Overview (SQL Server)
You probably have two options, as SQL Server doesn't support table backups. Both would start with scripting the table creation. Then you can either use the Script Table - INSERT option which will generate a lot of insert statements, or you can use Integration services (DTS with 2000) or similar to export the data as CSV or similar.
BMC Recovery Manager (formerly known as SQLBacktrack) allows point-in-time recovery of individual objects in a database (aka tables). It is not cheap but does a fantastic job:
http://www.bmc.com/products/proddocview/0,2832,19052_19429_70025639_147752,00.html
http://www.bmc.com/products/proddocview/0,2832,19052_19429_67883151_147636,00.html
If you are looking to be able to restore a table after someone has mistakenly deleted rows from it you could maybe have a look at database snapshots. You could restore the table quite easily (or a subset of the rows) from the snapshot. See http://msdn.microsoft.com/en-us/library/ms175158.aspx
A free app named SqlTableZip will get the job done.
Basically, you write any query (which, of course can also be [select * from table]) and the app creates a compressed file with all the data, which can be restored later.
Link:
http://www.doccolabs.com/products_sqltablezip.html
Handy Backup automatically makes dump files from MS SQL Server, including MSSQL 2005/2008. These dumps are table-level binary files, containing exact copies of the particular database content.
To make a simple dump with Handy Backup, please follow the next instruction:
Install Handy Backup and create a new backup task.
Select “MSSQL” on a Step 2 as a data source. On a new window, mark a database to back up.
Select among different destinations where you will store your backups.
On a Step 4, select the “Full” backup option. Set up a time stamp if you need it.
Skip a Step 5 unless you have a need to compress or encrypt a resulting dump file.
On a Step 6, set up a schedule for a task to create dumps periodically (else run a task manually).
Again, skip a Step 7, and give your task a name on a Step 8. You are finished the task!
Now run your new task by clicking on an icon before its name, or wait for scheduled time. Handy Backup will automatically create a dump for your database.
Then open your backup destination. You will find a folder (or a couple of folders) with your MS SQL backups. Any such folder will contains a table-level dump file, consisting of some binary tables and settings compressed into a single ZIP.
Other Databases
Handy Backup can save dumps for MySQL, MariaDB, PostgreSQL, Oracle, IBM DB2, Lotus Notes and any generic SQL database having an ODBC driver. Some of these databases require additional steps to establish connections between the DBMS and Handy Backup.
The tools described above often dump SQL databases as table-level SQL command sequence, making these files ready for any manual modifications you need.

Hive: what happens if I delete file that is being queried at the moment?

Let's say we have a Hive table stored on HDFS as directory like this:
data/
|-- file1
|-- file2
|-- file3
What happens if I start long query over this directory and then delete one of the files?
I can think of 3 scenarios:
File descriptors are opened at the beginning and data is kept until the end of the query, even though file paths aren't available for new queries anymore.
Hive remembers file paths and fails the query if it cannot find deleted files.
Hive doesn't remember file paths and takes only files that are in the directory right now.
If Hive behaves like (2) and it isn't safe to delete the files during the query, what is the proper way to drop old data from the directory being queried?
As stated by #Shankarsh, Hive tries to coordinate its queries using a "lock" table in its metastore DB. Try running the show locks ; command while another session is running a long SELECT or INSERT query, and yet another session tries to ALTER the table (having to wait until it can acquire an exclusive lock) to see by yourself.
Unfortunately that will not prevent a direct HDFS access to the files and directories. AFAIK there is only one type of lock in HDFS, and it's an exclusive lock used to create/append/truncate the file (or the last block in an existing file).
Typical scenario: you submit a query; Hive retrieves the list of files and file blocks at query compile time then launches some mappers to read from these blocks; meanwhile another job requests deletion of one of the files ==> one of the mappers will crash with FileNotFoundException (I've been there!)
Another typical scenario: ...meanwhile another job creates a new file, or appends a new block to an existing file ==> that data will never be accessed -- and that's not a bad thing by the way.
Bottom line: avoid deleting files in a HDFS directory used by a Hive table (whether managed or external) unless you can make sure that no query is currently running, or may be running soon. If you want to delete all the files at once, for a managed table, use TRUNCATE at table/partition level and let Hive do the dirty coordination stuff.
In some cases you might try a complicated trick with a temp table having a single partition, an EXCHANGE PARTITION Hive command (...coordination...), then the HDFS deletion in the temp directory, then another EXCHANGE PARTITION to return all remaining files back in place -- but of course, any query started in between would see an empty table, and that could be a problem.
I guess Hive will have do a table level lock(shared read only) and it wont allow any updates/deletes on the table, so ideally it wont allow data to be deleted.
Please have a look at this post as well:
Hive Locks entire database when running select on one table

Writing autosys job information to Oracle DB

Here's my situation: we have no access to the autosys server other than using the autorep command. We need to keep detailed statistics on each of our jobs. I have written some Oracle database tables that will store start/end times, exit codes, JIL, etc.
What I need to know is what is the easiest way to output the data we require (which is all available in the autosys tables that we do not have access to) to an Oracle database.
Here are the technical details of our system:
autosys version - I cannot figure out how to get this information
Oracle version - 11g
We have two separate environments - one for UAT/QA/IT and several PROD servers
Do something like below
Create a table with the parameters you want to put. Put a key columns which should be auto generated. The jil column should be able to handle huge data. Also add one columns for sysdate.
Create a shell script. Inside it do as follows
"autorep -j -l0" to get all the jobs you want and put them in a file. -l0 is to ignore duplicate jobs. If a Box contain a job, then without -l0 you will get the job twice.
create a loop and read all the job names one by one.
In the loop, set varaibles for jobname/starttime/endtime/status (which all you can get from autorep -j . Then use a variable to hold jil by autorep -q -j
Append all these variable values in a flat file.
End the loop. After exiting a loop you wil end up with a file with all the job details.
Then use SQL loader to put the data in your oracle table. You can hardcode a control file and use it for every run. But the content of data file will change for every run.
Let me know if any part is not clear.

Performing partial backup in neo4j

I have more separate independent structures in the database. I need to do a backup for each of these structures separately and not to do a full backup of everything.
I am interested is there a way to do a backup of some specific graph part. I checked what backup strategies there are in the neo4j documentation. There are incremental backup and full backup, but I could not find the possibility to extract and backup only some part of the graph or maybe some independent graph structure in the database.
Ideal would be to define cypher query and to get the result like that. For example in most relational databases it is possible to extract/backup separate table or dataset (depending on database). So that is something I am looking to do in neo4j too. Define node label and then do a backup or by some other criteria.
You can use the experimental dump command along with the shell :
Example: dumping the user nodes to a users.cypher file that will contain all the cypher statements for recreating the users later :
./bin/neo4j-shell -c 'dump MATCH (n:User) RETURN n;' > users.cypher
Related info in the documentation : http://neo4j.com/docs/stable/shell-commands.html#_dumping_the_database_or_cypher_statement_results

Resources